Crate unicode_width
source ·Expand description
Determine displayed width of char
and str
types according to
Unicode Standard Annex #11,
other portions of the Unicode standard, and common implementations of
POSIX wcwidth()
.
See the Rules for determining width section
for the exact rules.
This crate is #![no_std]
.
use unicode_width::UnicodeWidthStr;
let teststr = "Hello, world!";
let width = UnicodeWidthStr::width(teststr);
println!("{}", teststr);
println!("The above string is {} columns wide.", width);
let width = teststr.width_cjk();
println!("The above string is {} columns wide (CJK).", width);
§Rules for determining width
This crate currently uses the following rules to determine the width of a character or string, in order of decreasing precedence. These may be tweaked in the future.
- Emoji presentation sequences have width 2. (The width of a string may therefore differ from the sum of the widths of its characters.)
'\u{00AD}'
SOFT HYPHEN has width 1.'\u{115F}'
HANGUL CHOSEONG FILLER has width 2.- The following have width 0:
- Characters
with the
Default_Ignorable_Code_Point
property. - Characters
with the
Grapheme_Extend
property. - The following 8 characters, all of which have NFD decompositions consisting of two
Grapheme_Extend
chracters:'\u{0CC0}'
KANNADA VOWEL SIGN II,'\u{0CC7}'
KANNADA VOWEL SIGN EE,'\u{0CC8}'
KANNADA VOWEL SIGN AI,'\u{0CCA}'
KANNADA VOWEL SIGN O,'\u{0CCB}'
KANNADA VOWEL SIGN OO,'\u{1B3B}'
BALINESE VOWEL SIGN RA REPA TEDUNG,'\u{1B3D}'
BALINESE VOWEL SIGN LA LENGA TEDUNG, and'\u{1B43}'
BALINESE VOWEL SIGN PEPET TEDUNG.
- Characters
with a
Hangul_Syllable_Type
ofVowel_Jamo
(V
) orTrailing_Jamo
(T
). '\0'
NUL.
- Characters
with the
- The control characters have no defined width, and are ignored when determining the width of a string.
- Characters
with an
East_Asian_Width
ofFullwidth
(F
) orWide
(W
) have width 2. - Characters
with an
East_Asian_Width
ofAmbiguous
(A
) have width 2 in an East Asian context, and width 1 otherwise. - All other characters have width 1.
§Canonical equivalence
The non-CJK width methods guarantee that canonically equivalent strings are assigned the same width. However, this guarantee does not currently hold for the CJK width variants.
Constants§
- The version of Unicode that this version of unicode-width is based on.
Traits§
- Methods for determining displayed width of Unicode characters.
- Methods for determining displayed width of Unicode strings.